Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection

نویسندگان

چکیده

How to sample training/validation data is an important question for machine learning models, especially when the dataset heterogeneous and skewed. In this paper, we propose a sampling method that robustly selects data. We formulate process as two-player game: trainer aims training so minimize test error, while validator adversarially samples validation can increase error. Robust achieved at game equilibrium. To accelerate searching process, adopt reinforcement aided Monte Carlo trees search (MCTS). apply our car-following modeling problem, complicated scenario with random human driving behavior. Real-world data, Next Generation SIMulation (NGSIM), used validate method, experiment results demonstrate robustness thereby model out-of-sample performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Strategies in Game-theoretic Data Interaction

As most database users cannot precisely express their information needs in the form of database queries, it is challenging for database query interfaces to understand and satisfy their intents. Database systems usually improve their understanding of users’ intents by collecting their feedback on the answers to the users’ imprecise and ill-specified queries. Users may also learn to express their...

متن کامل

Towards a game-theoretic framework for text data retrieval

The task of text data retrieval has traditionally been defined as to rank a collection of text documents in response to a query. While this definition has enabled most research progress so far, it does not model accurately the actual retrieval task in a real search engine application, where users tend to be engaged in an interactive process with multipe queries, and optimizing the overall perfo...

متن کامل

NGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map

Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...

متن کامل

Machine Learning Models for Housing Prices Forecasting using Registration Data

This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...

متن کامل

Generalization Analysis for Game-Theoretic Machine Learning

For Internet applications like sponsored search, cautions need to be taken when using machine learning to optimize their mechanisms (e.g., auction) since selfinterested agents in these applications may change their behaviors (and thus the data distribution) in response to the mechanisms. To tackle this problem, a framework called game-theoretic machine learning (GTML) was recently proposed, whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Games

سال: 2023

ISSN: ['2073-4336']

DOI: https://doi.org/10.3390/g14010013